[Serve][2/N] Implement AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments#63179
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a structured AcceleratorConfig for Ray Serve to support TPU slice reservations. It implements a new placement group management layer (_ReplicaPlacementGroup) that handles accelerator-specific lifecycle tasks, such as releasing head placement groups after scheduling. The changes span the Serve controller, deployment scheduler, and LLM engine configurations to enable per-host TPU bundle allocation. Review feedback highlights a critical runtime error where an invalid label_selector is passed to actor options, identifies missing logic for passing user-defined bundle label selectors, and notes a documentation mismatch in the TPU utility classes.
6885e61 to
aede10e
Compare
aede10e to
e737494
Compare
e737494 to
c3cdf41
Compare
c3cdf41 to
0f0088a
Compare
|
please break up the PR, atleast into serve parts first then the llm changes, would make it easier to review |
d8ff7d5 to
9d52483
Compare
AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deploymentsAcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments
… tests Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…t `bundle_label_selector` Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
9d52483 to
5e79faa
Compare
Sounds good I'm going to make this PR the Serve changes (although it includes changes from #63171 for now for tests to work, but this should merge first). #63216 will include the changes from this PR so that integration tests work, and the LLM specific changes. |
AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deploymentsAcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments
5e79faa to
c547e2a
Compare
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
|
will fix merge conflicts once: #63177 is merged |
|
|
There was a problem hiding this comment.
Took a first pass -- could we break the PR down into the followings?
- Config / frontend only: ensure that
AcceleratorConfigis plumbed through@serve.deployment,Deployment.options,DeploymentSchema(declarative YAMLs), protobuf surfaces. I think we're missing some plumbings in this PR. - Scheduler and state reconciliation
Tests haven't been reviewed.
|
Some other questions: Lifecycle of
|
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
…strings, change from Dev API to PublicAPI, and fix other comments Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Yeah that's possible and is a valid issue, the current behavior would allow a slice PG call to discover a seemingly available TPU head, attempt to reserve it, and leave the worker PG hanging indefinitely.
The controller would just timeout waiting for the PG to become ready.
This wouldn't work because the two PGs would be in contention for the same resource, if we release the head_pg first we risk a race with another slice claiming it.
Yeah I think we should fix this, the simplest solution is to just go with what Should be fixed with 26ae3e2 |
… change default to SPREAD strategy Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 26ae3e2. Configure here.
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
|
addressed outstanding design related comments / bugs, will work on splitting this PR into two smaller ones |
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
|
Opened up #63581 to address #63179 (review), adding just the |
AcceleratorConfig to enable custom scheduling logic for accelerators with Serve deploymentsAcceleratorConfig to enable custom scheduling logic for accelerators with Serve deployments
Nice thank you! Taking a look now. |
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>

Description
This PR introduces a structured
AcceleratorConfig(starting withTPUAcceleratorConfig) for Ray Serve deployments to support advanced accelerator provisioning. Deployments withaccelerator_configset use a per-replica PG creation path that dispatches toslice_placement_groupfor TPU. Gang scheduling is bypassed for these deployments -SlicePlacementGroupis itself a gang-scheduling primitive, so layering Gang PG on top would solve the same problem twice.Specific Changes:
AcceleratorConfigandTPUAcceleratorConfigPydantic models defining hardware requirements (topology, version, chips per VM).accelerator_configto serve.proto and threaded throughReplicaSchedulingRequestandCreatePlacementGroupRequest.ReplicaPlacementGroupwrapper delegatingshutdown()andrelease_head_pgs()to the underlying TPU-specific PG._create_replica_placement_groupas the internal scheduler entry point; dispatches onaccelerator_configand wraps theresult. _default_create_placement_group's public signature is unchanged, so externalcreate_placement_group_fn_overrideusers keep working.ReplicaPlacementGroup.shutdown()on teardown andrelease_reservation_holders()after worker PG readiness.Edit: I scoped this PR way down to not include unrelated Gang PG changes - which can be in a separate PR.
Related issues
#57137
Additional information